Deep Learning

[452SM]
a.a. 2024/2025

1° Year of course - Second semester

Frequency Not mandatory

  • 6 CFU
  • 48 hours
  • INGLESE
  • Trieste
  • Obbligatoria
  • Standard teaching
  • Oral Exam
  • SSD ING-INF/05
  • Advanced concepts and skills
Curricula: COMPUTATIONAL MODELING AND DIGITAL TWINS
Syllabus

This course will study the building blocks of deep neural networks and how to use them to construct and train powerful architectures for classification, regression and probabilistic modelling.

Knowledge and Understanding: Basic and advanced topics on the design, training, debugging and performance quantification of neural networks; analysis of representations; knowledge of the Pytorch deep learning library.

Applied knowledge and understanding: be able to tackle a complex data modelling problem and construct effective approaches based on deep learning. Combine different approaches (architectures, training strategies, etc.) to improve performance and/or robustness. Be able to use and modify, if necessary, state-of-the-art models.

Autonomy of judgement: be able to apply deep learning models critically, identifying the most effective approaches to solve a given problem. Be able to critically compare different models and assess their relative strength, weakness and effectiveness. Being able to calibrate the complexity of models to the complexity of problems.

Communication skills: Be able to explain the ideas of deep learning and communicate the results to a knowledgeable audience.

Learning skills: be able to study advanced topics independently, be able to critically read literature (both books and scientific articles) and resources available online, including those related to advanced programming and core frameworks, be able to apply correctly and critically deep learning models to a variety of problems, be able to monitor and debug the training process.

Basic knowledge of Python, probability, and machine learning, as from introductory courses.

The course provides an extensive treatment of deep learning starting
from the basics to advanced topics.

- Introduction to deep learning
- Artificial neurons, shallow networks, deep networks
- Loss functions
- Gradient descent and its variants
- Backpropagation
- Regularization
- Convolutional neural networks
- Recurrent neural networks
- Attention models and transformers
- Introduction to deep unsupervised learning
- Analysis of representations in deep learning models

- Understanding deep learning, Simon J.D. Prince, To be published by MIT Press.
- Pattern recognition and machine learning, C.M. Bishop, Springer 2006

Introduction to deep learning: supervised and unsupervised problems.
Supervised learning.
Analysis of linear regression. Shallow neural networks. Non-linearities. Universal approximation theorems. Scaling of the number of linear regions. Multivariate inputs and outputs.
Deep neural networks. Expressivity and general scaling number of linear regions. Shallow vs deep networks. Notion of representation.

Loss functions. Maximum likelihood estimation. Construction of loss functions and examples. Cross-entropy loss.

Gradient descent. Stochastic gradient descent. Momentum. Adam. Hyperparameters tuning.

Curse of dimensionality. Manifold hypothesis. Dimensionality of representations. Blessings of dimensionality. Concentration phenomena in high-dimensional spaces.

Gradients. Backpropagation. Initialization of parameters, derivation of the He initialization and its motivations.

Measuring performance. Monitoring the training process. Sources of error: bias, variance and noise. Double descent phenomenology. Overparametrized and underparametrized regimes: detailed discussion of the double descent phenomenon in the linear regression case, with the aid of SVD decomposition.

Diagnostic of the source of error and heuristics to improve performance.

Regularization: explicit and implicit.

Convolutional networks for 1D and 2D inputs. Applications to object recognition and image segmentation. Data augmentation strategies and their mathematical interpretation. Inspiration from biology: receptive fields, Hubel and Wiesel model.

Residual networks. Residual connections and residual blocks. Vanishing gradient solution. Interpretation as a network ensemble. Exploding gradients and batch normalization. Residual architectures.

Unsupervised learning: Taxonomy. Variational autoencoders. Latent variable models. Training. ELBO and its properties. Variational approximation. The VAE. Reparametrization trick.
Diffusion models: architectures and applications. Encoder and decoder. Training. Reparametrization of the loss.

Language models. Recurrent neural networks. Embeddings. Training. Teacher forcing. Autoregressive generation. Encoder-decoder architectures for sequence-to-sequence problems. Dot-product attention. Attention maps.

Attention mechanisms: implicit (Grad-CAM) and explicit.

Transformer. Dot-product self-attention. Transformer layers. Encoder models. Decoder models. Encoder-decoder models. Transformers for images.

Frontal lectures and hands on sessions, both individual and in groups. The balance will be roughly 70% of frontal lectures and 30% of hands-on sessions. Hands on activity typically involve experimenting with Pytorch and developing or using/testing tools implementing the methodologies seen during lectures. During the lectures, homework exercises, both theoretical and coding-based, will be given, with submission deadlines of approximatively two weeks. Handling homework is not compulsory but can be subject of the final oral evaluation.

.

The exam will consist of two parts:
1. a group project work, ideally in groups of 2 to 3 students (but exceptions are possible for large projects, to a maximum of 4). Each group will work on a well defined set of tasks, typically analysing a complex dataset or investigating and experimenting a methodology not seen in detail during the lectures. The group will have to give a presentation (30 minutes), with supporting slides or to the blackboard (depending on the case), explaining the work done and provide commented code upon request. The topic of the project have to be proposed by the group of students and validated by the lecturer. Main points of evaluation are clarity and comprehensiveness of the presentation, understanding of the topic and depth and originality of the performed analyses.

2. an oral interview (30 minutes) where few questions will be asked to asses the individual contributions in the project and the level of understanding on the topics of the course. Main points of evaluation are clarity and precision of answers, technical understanding of the methods and understanding of their conditions of applicability.

The two parts are done in the same session with all the group members present.

The final mark will reflect the project and the oral part. Laude can be given for an exceptional exam.